Prior 96w evaluation of TurboCapture-Seq v2 showed low UMIs recovered and high seq saturation. I did low throughput troubleshooting and didn’t see any issues. Process this experiment myself taking care to remove residual liquids from wash steps.
Process Hui Shi of Naik lab + Flt3 timecourse. Includes a few of my samples.
Samples
SCE object in generate in 1A_generateSCE_reads notebook. The samples were then clustered in the 2B Clustering Wt notebook
Compare different subsets sorted from dendritic cells at 5 and 7 days. The easiest contrasts to interpret are the difference from the common dendritic cell precursor (CDP).
Workflow is from:
https://bioconductor.org/packages/devel/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html#differential-expression-analysis
This was generated in notebook 2B.
For the comparision of pre-pDC-like cells with the mature subsets I am
not interested in the day 7 timepoint. Remove these samples.
sce <- readRDS(here::here(
"data/TIRE_dendritic_mouse/SCEs", "DCs_cluster.sce.rds"))
sce$Time_day <- paste(sce$Cell_type, sce$Timepoint_Day, sep="_")
sce <- sce[,sce$Time_day != "pre-pDC_like_7"]
dge <- scran::convertTo(sce, type="edgeR")Have a look at the important metadata.
There are not enough replicates for the preDC contrast to be
meaningful.
tb <- dge$samples[,c("Cell_type", "Cell_number", "Ligand", "Timepoint_Day")]
tb %>%
dplyr::count(Cell_type, Ligand, Timepoint_Day)The cell type is the major difference between samples with the timepoint being much less so.
plt1 <- scater::plotPCA(sce,colour_by="Cell_type", shape_by= "Timepoint_Day") +
guides(
color = guide_legend(ncol = 2),
shape = guide_legend(ncol = 2)
) +
theme_Publication()
plt1Write the day as text.
pca_tb <- as_tibble(reducedDim(sce))
pca_tb$Cell_Type <- sce$Cell_type
pca_tb$Timepoint <- sce$Timepoint_Day
pca_tb$Timepoint <- as.factor(pca_tb$Timepoint)
plt2 <- ggplot(pca_tb, aes(x=PC1, y=PC2, colour=Cell_Type, label=Timepoint)) +
geom_text(size=5) +
xlab("PC1 (34%)") + ylab("PC2 (28%)") +
scale_colour_brewer(palette = "Dark2") +
theme_Publication()
plt2Write the day as dots
plt3 <- ggplot(pca_tb, aes(x=PC1, y=PC2, colour=Cell_Type)) +
geom_point(size=3) +
xlab("PC1 (34%)") + ylab("PC2 (28%)") +
scale_colour_brewer(palette = "Dark2") +
theme_Publication()
plt3Doing this reduces the multiple testing burden and fits variation better.
## [1] 21484 28
keep.exprs <- filterByExpr(dge, group=dge$samples$Timepoint_Day, min.count=1)
dge <- dge[keep.exprs,, keep.lib.sizes=FALSE]
dim(dge)## [1] 10612 28
Model timepoint as a numeric
## [1] 5 0 1 7
## Levels: 0 1 5 7
Build the model matrix.
Regress out the effect of the timepoint as this is minor compared to the
cell type subsets.
## Day1 Day2 Cell_typecDC2 Cell_typeCDP Cell_typepDC
## AACGAGCCTG 0.5689662 -0.0933913 0 0 0
## ACAGTAGCTC 0.0000000 0.0000000 0 0 0
## AGACGCATCG 0.3997543 -0.2581491 0 0 0
## AGTTCCGTGA 0.5689662 -0.0933913 0 0 0
## ATCAAGAACG 0.5689662 -0.0933913 0 1 0
## ATCAGCGCGA 0.3923644 0.7057267 1 0 0
## Cell_typepre-pDC_like Cell_typepreDC Cell_typeTotal_DC_culture
## AACGAGCCTG 1 0 0
## ACAGTAGCTC 0 0 1
## AGACGCATCG 0 0 1
## AGTTCCGTGA 1 0 0
## ATCAAGAACG 0 0 0
## ATCAGCGCGA 0 0 0
Decide on the contrasts. I got this answer from Perplexity ai:
The CDP (Common Dendritic cell Progenitor) is a key cell type in the dendritic cell lineage development:
This matches with the PCA where cDC samples were the most distinct.
contr.matrix <- makeContrasts(
CDPvspDC = Cell_typepDC - Cell_typeCDP,
CDPvscDC2 = Cell_typecDC2 - Cell_typeCDP,
cDC2vcpDC = Cell_typepDC - Cell_typecDC2,
cDC2vspDClike = Cell_typecDC2 - Cell_typepre.pDC_like,
pDCvspDClike = Cell_typepDC - Cell_typepre.pDC_like,
levels = colnames(sm))
contr.matrix %>%
kable()| CDPvspDC | CDPvscDC2 | cDC2vcpDC | cDC2vspDClike | pDCvspDClike | |
|---|---|---|---|---|---|
| Day1 | 0 | 0 | 0 | 0 | 0 |
| Day2 | 0 | 0 | 0 | 0 | 0 |
| Cell_typecDC2 | 0 | 1 | -1 | 1 | 0 |
| Cell_typeCDP | -1 | -1 | 0 | 0 | 0 |
| Cell_typepDC | 1 | 0 | 1 | 0 | 1 |
| Cell_typepre.pDC_like | 0 | 0 | 0 | -1 | -1 |
| Cell_typepreDC | 0 | 0 | 0 | 0 | 0 |
| Cell_typeTotal_DC_culture | 0 | 0 | 0 | 0 | 0 |
In each contrast, the format is A - B where:
Fit this model using limma voom.
The final model looks good.
par(mfrow=c(1,2))
v <- voom(dge, sm, plot=TRUE)
vfit <- lmFit(v, sm)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)
plotSA(efit, main="Final model: Mean-variance trend")Check how many genes are differentially expressed
## CDPvspDC CDPvscDC2 cDC2vcpDC cDC2vspDClike pDCvspDClike
## Down 585 832 227 191 30
## NotSig 9420 8413 10216 9867 10534
## Up 607 1367 169 554 48
The write.fit function can be used to extract and write results for all three comparisons to a single output file.
These are the 2 most mature subsets.
The following info from Miltenyi
pDCs are primarily located in blood and lymphoid tissues. They depend on the E2-2 transcription factor and express B220, Siglec-H, mPDCA-1 (CD317 or Bst2), as well as intermediate levels of MHC class II, CD11c, and costimulatory molecules. pDCs are poor stimulators of T helper (TH) cells, but upon stimulation with bacterial DNA containing particular unmethylated CpG motifs or upon viral challenge, they produce large amounts of type I IFN and acquire antigen-presenting capacity (PMID: 16172135, 15728491).
cDC2 exhibit the same MHC class II and CD11c expression pattern as cDC1, but express additional markers not present on cDC1 and depend on a different transcription factor, i.e., IRF4. Resident spleen and lymph node cDC2 express CD4 and SIRPα. Migratory cDC2 that infiltrate lymph nodes can be distinguished from resident cDCs under non-inflammatory conditions by the expression of MHC class II, CD11c, and peripheral and migratory markers (e.g. CCR7 and maturation markers). cDC2 induce different responses, such as activation of ILC2 and TH2 cells against parasites and during asthma, and induction of ILC3 and TH17 immune responses to extracellular bacteria (PMID: 27760337).
cDC2vcpDC <- topTable(efit, coef=3, n=length(efit$genes$ID), sort.by = "logFC")
results <- as_tibble(cDC2vcpDC)
results$ID <- rownames(cDC2vcpDC)
# add a column of NAs
results$DElabel <- "n/s"
# if log2Foldchange > 1 or < -1 and pvalue < 0.05, set as "UP"
results$DElabel[results$logFC > 1 & results$adj.P.Val < 0.1] <- "pDC"
results$DElabel[results$logFC < -1 & results$adj.P.Val < 0.1] <- "cDC2"Add gene labels
results$genelabels <- ""
results$genelabels[results$ID == "Ly6d"] <- "Ly6d"
results$genelabels[results$ID == "Ccr9"] <- "Ccr9"
results$genelabels[results$ID == "Cd9"] <- "Cd9"
results$genelabels[results$ID == "Anxa1"] <- "Anxa1"
results$genelabels[results$ID == "Anxa2"] <- "Anxa2"
results$genelabels[results$ID == "Siglech"] <- "Siglech"
results$genelabels[results$ID == "Ctsl"] <- "Ctsl"
results$genelabels[results$ID == "Lyz2"] <- "Lyz2"plt3 <- ggplot(data=results, aes(x=logFC, y=-log10(adj.P.Val), colour=DElabel, label=genelabels)) +
geom_point(alpha=0.33, size=1.5) +
geom_text_repel(size=4, colour="black") +
guides(colour = guide_legend(override.aes = list(size=3, alpha=1))) +
scale_color_manual(values = c("darkblue", "grey", "darkorange"), name = "Upregulated") +
geom_vline(xintercept = 1, linetype="dotted") +
geom_vline(xintercept = -1, linetype="dotted") +
theme_Publication()
plt3plt2 <- ggplot(data=results, aes(x=AveExpr, y=logFC, colour=DElabel)) +
geom_point(alpha=0.75, size=1.5) +
guides(colour = guide_legend(override.aes = list(size=3, alpha=1))) +
ylab("Log fold change") + xlab("Log counts per million") +
scale_color_manual(values = c("darkblue", "grey", "darkorange"), name = "Upregulated") +
theme_Publication()
plt2Need to convert geneIDs from ensembl to enterez
geneids <- as.data.frame(v$genes$ID)
colnames(geneids) <- "ENSEMBL"
geneids$entrez <- mapIds(org.Mm.eg.db, keys = geneids$ENSEMBL, keytype = "ENSEMBL", column = "ENTREZID")Look at H hallmark gene sets. Only EMT and apoptosis is different and barely so.
load("data/MSigDB/mouse_H_v5p2.rdata")
idx <- ids2indices(Mm.H,identifiers = geneids$entrez)
cam.pDC.cDC2 <- camera(v,idx,sm,contrast=contr.matrix[,4])
head(cam.pDC.cDC2,10)Visualize as a barplot
I checked the literature and there is some unknowns about the
pre-pDC-like subset and how comitted they are to the pDC lineage.
I have samples of pre-pDCs so may as well investigate this
difference.
cDC2vspDClike <- topTable(efit, coef=4, n=length(efit$genes$ID), sort.by = "logFC")
results <- as_tibble(cDC2vspDClike)
results$ID <- rownames(cDC2vspDClike)
# add a column of NAs
results$DElabel <- "n/s"
# if log2Foldchange > 1 or < -1 and pvalue < 0.05, set as "UP"
results$DElabel[results$logFC > 1 & results$adj.P.Val < 0.1] <- "cDC2"
results$DElabel[results$logFC < -1 & results$adj.P.Val < 0.1] <- "pre-pDC-like"Add gene labels
results$genelabels <- ""
results$genelabels[results$ID == "Lpl"] <- "Lpl"
results$genelabels[results$ID == "Gpnmb"] <- "Gpnmb"
results$genelabels[results$ID == "Mpo"] <- "Mpo"
results$genelabels[results$ID == "Lyz2"] <- "Lyz2"
results$genelabels[results$ID == "Atp1b1"] <- "Atp1b1"
results$genelabels[results$ID == "Ly6d"] <- "Ly6d"
results$genelabels[results$ID == "Atp2b4"] <- "Atp2b4"
results$genelabels[results$ID == "Irf8"] <- "Irf8"plt4 <- ggplot(data=results, aes(x=logFC, y=-log10(adj.P.Val), colour=DElabel, label=genelabels)) +
geom_point(alpha=0.33, size=1.5) +
geom_text_repel(size=4, colour="black") +
guides(colour = guide_legend(override.aes = list(size=3, alpha=1))) +
scale_color_manual(values = c("darkblue", "grey", "darkorange"), name = "Upregulated") +
geom_vline(xintercept = 1, linetype="dotted") +
geom_vline(xintercept = -1, linetype="dotted") +
theme_Publication()
plt4pDCvspDClike <- topTable(efit, coef=5, n=length(efit$genes$ID), sort.by = "logFC")
results <- as_tibble(pDCvspDClike)
results$ID <- rownames(pDCvspDClike)
# add a column of NAs
results$DElabel <- "n/s"
# if log2Foldchange > 1 or < -1 and pvalue < 0.05, set as "UP"
results$DElabel[results$logFC > 1 & results$adj.P.Val < 0.1] <- "pDC"
results$DElabel[results$logFC < -1 & results$adj.P.Val < 0.1] <- "pre-pDC-like"Add gene labels
results$genelabels <- ""
results$genelabels[results$ID == "Gpnmb"] <- "Gpnmb"
results$genelabels[results$ID == "Ccr9"] <- "Ccr9"
results$genelabels[results$ID == "Sh3bp2"] <- "Sh3bp2"
results$genelabels[results$ID == "Ly6a"] <- "Ly6a"
results$genelabels[results$ID == "Bub1"] <- "Bub1"
results$genelabels[results$ID == "Gem"] <- "Gem"
results$genelabels[results$ID == "Il12a"] <- "Il12a"
results$genelabels[results$ID == "Ccnd2"] <- "Ccnd2"plt5 <- ggplot(data=results, aes(x=logFC, y=-log10(adj.P.Val), colour=DElabel, label=genelabels)) +
geom_point(alpha=0.33, size=1.5) +
geom_text_repel(size=4, colour="black") +
guides(colour = guide_legend(override.aes = list(size=3, alpha=1))) +
scale_color_manual(values = c("grey", "darkblue", "darkorange"), name = "Upregulated") +
geom_vline(xintercept = 1, linetype="dotted") +
geom_vline(xintercept = -1, linetype="dotted") +
theme_Publication()
plt5Something of interest would be common genes that go up in the differentiation from CDP in both cell types. This would not be evident in a head to head DE but can be inferred by comparing theeir DE from the CDP state.
| CDPvspDC | CDPvscDC2 | cDC2vcpDC | cDC2vspDClike | pDCvspDClike | |
|---|---|---|---|---|---|
| Down | 585 | 832 | 227 | 191 | 30 |
| NotSig | 9420 | 8413 | 10216 | 9867 | 10534 |
| Up | 607 | 1367 | 169 | 554 | 48 |
cDC2vcpDC <- topTable(efit, coef=3, n=length(efit$genes$ID), sort.by = "logFC", p.value = 0.05, lfc=1)
cDC2vpDClike <- topTable(efit, coef=4, n=length(efit$genes$ID), sort.by = "logFC", p.value = 0.05, lfc=1)
pDCvpDClike <- topTable(efit, coef=5, n=length(efit$genes$ID), sort.by = "logFC", p.value = 0.05, lfc=1)Find genes that are common DE in pDC and cDC2 as they differentiate from CDPs.
Interpretation here is:
venn_list <- list(
"cDC2 vs pDC" = row.names(cDC2vcpDC),
"cDC2 vs pre-pDC-like" = row.names(cDC2vpDClike),
"pDC vs pre-pDC-like" = row.names(pDCvpDClike)
)
ggvenn(venn_list,
show_elements = F, label_sep = "\n",
text_size = 8,
show_percentage = FALSE,
fill_color = c("navy", "springgreen4", "orchid1")
)The following code was written by Claude.ai.
Its recmondadtion was Chi Squared test
# Create the observed frequencies
group1 <- 381 # cDC2 vs pDC total (216 + 128 + 33 + 4)
group2 <- 630 # cDC2 vs pre-pDC-like total (467 + 128 + 31 + 4)
# Create a matrix of observed frequencies
observed <- c(group1, group2)
names(observed) <- c("cDC2_vs_pDC", "cDC2_vs_prePDClike")
# Perform chi-square test
result <- chisq.test(observed)
# Print results
print(result)##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 61.326, df = 1, p-value = 4.836e-15
# Calculate proportion in each group for interpretation
total <- sum(observed)
prop1 <- group1/total
prop2 <- group2/total
cat("\nProportions:\n")##
## Proportions:
## cDC2 vs pDC: 37.69 %
## cDC2 vs pre-pDC-like: 62.31 %
What are DE genes in all conditions
Pretty much all in the order of CDP > cDC2 > pDC
Must be a kind of step-wise increase in gene expression between cell types.
plotExpression(sce[,sce$Cell_type %in% c("cDC2", "pDC", "pre-pDC_like")], all_de, colour_by = "Cell_type", point_size=2) +
theme_Publication(base_size = 16) Try a visualization with x axis as cell type
dc_counts <- as_tibble(
as.matrix(counts(sce[all_de,sce$Cell_type %in% c("cDC2", "pDC", "pre-pDC_like")])),
rownames = "gene_id" # This creates a new column called 'row_id' with the rownames
) %>%
pivot_longer(-gene_id)
dc_counts <- left_join(dc_counts, dge$samples,
by = c("name" = "Well_BC"))
dc_counts$Cell_type <- recode(dc_counts$Cell_type,
"pre-pDC_like" = "pre-pDC-like")
dc_counts$Cell_type <- factor(dc_counts$Cell_type,
levels = c("pre-pDC-like", "pDC", "cDC2"))View as a boxplot
ggplot(data=dc_counts,
aes(x=Cell_type, y=value+1, fill=gene_id)) +
geom_boxplot() +
scale_y_continuous(trans='log10') +
annotation_logticks(base = 10, sides = "l") +
scale_fill_brewer(type="Qualitative", palette = "Set2") +
xlab("") + ylab("Expression(logcount)")I don’t like this either. Try a wrapped dot plot
ggplot(data=dc_counts,
aes(x=Cell_type, y=value+1, colour=Cell_type)) +
geom_boxplot() + geom_jitter() +
scale_y_continuous(trans='log10') +
annotation_logticks(base = 10, sides = "l") +
scale_colour_manual(values = c(
"cDC2" = "#228B22", # Forest green
"pDC" = "#4B0082", # Dark purple
"pre-pDC-like" = "#FF1493"
)) +
xlab("") + ylab("Expression (logcounts)") +
facet_wrap(~ gene_id, scales = "free_y", labeller = labeller(ID = label_value)) +
theme(strip.text = element_text(face = "bold.italic"))I like the wrapped boxplot
The results make sense to me. Will confirm with the domain experts.
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Red Hat Enterprise Linux 9.4 (Plow)
##
## Matrix products: default
## BLAS: /stornext/System/data/software/rhel/9/base/tools/R/4.4.1/lib64/R/lib/libRblas.so
## LAPACK: /stornext/System/data/software/rhel/9/base/tools/R/4.4.1/lib64/R/lib/libRlapack.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid splines stats4 stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] ggthemes_5.1.0 here_1.0.1
## [3] org.Mm.eg.db_3.19.1 AnnotationDbi_1.66.0
## [5] viridis_0.6.5 viridisLite_0.4.2
## [7] ggrepel_0.9.6 ggvenn_0.1.10
## [9] knitr_1.48 patchwork_1.3.0
## [11] edgeR_4.2.2 limma_3.60.6
## [13] scater_1.32.1 scran_1.32.0
## [15] scuttle_1.14.0 lubridate_1.9.3
## [17] forcats_1.0.0 stringr_1.5.1
## [19] dplyr_1.1.4 purrr_1.0.2
## [21] readr_2.1.5 tidyr_1.3.1
## [23] tibble_3.2.1 ggplot2_3.5.1
## [25] tidyverse_2.0.0 SingleCellExperiment_1.26.0
## [27] SummarizedExperiment_1.34.0 Biobase_2.64.0
## [29] GenomicRanges_1.56.2 GenomeInfoDb_1.40.1
## [31] IRanges_2.38.1 S4Vectors_0.42.1
## [33] BiocGenerics_0.50.0 MatrixGenerics_1.16.0
## [35] matrixStats_1.4.1
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 gridExtra_2.3
## [3] rlang_1.1.4 magrittr_2.0.3
## [5] RSQLite_2.3.7 compiler_4.4.1
## [7] DelayedMatrixStats_1.26.0 png_0.1-8
## [9] vctrs_0.6.5 pkgconfig_2.0.3
## [11] crayon_1.5.3 fastmap_1.2.0
## [13] XVector_0.44.0 labeling_0.4.3
## [15] utf8_1.2.4 rmarkdown_2.28
## [17] tzdb_0.4.0 UCSC.utils_1.0.0
## [19] ggbeeswarm_0.7.2 bit_4.5.0
## [21] xfun_0.48 bluster_1.14.0
## [23] zlibbioc_1.50.0 cachem_1.1.0
## [25] beachmat_2.20.0 jsonlite_1.8.9
## [27] blob_1.2.4 highr_0.11
## [29] DelayedArray_0.30.1 BiocParallel_1.38.0
## [31] irlba_2.3.5.1 parallel_4.4.1
## [33] cluster_2.1.6 R6_2.5.1
## [35] RColorBrewer_1.1-3 bslib_0.8.0
## [37] stringi_1.8.4 jquerylib_0.1.4
## [39] Rcpp_1.0.13 Matrix_1.7-0
## [41] igraph_2.0.3 timechange_0.3.0
## [43] tidyselect_1.2.1 rstudioapi_0.17.0
## [45] abind_1.4-8 yaml_2.3.10
## [47] codetools_0.2-20 lattice_0.22-6
## [49] KEGGREST_1.44.1 withr_3.0.1
## [51] evaluate_1.0.1 Biostrings_2.72.1
## [53] pillar_1.9.0 generics_0.1.3
## [55] rprojroot_2.0.4 hms_1.1.3
## [57] sparseMatrixStats_1.16.0 munsell_0.5.1
## [59] scales_1.3.0 glue_1.8.0
## [61] metapod_1.12.0 tools_4.4.1
## [63] BiocNeighbors_1.22.0 ScaledMatrix_1.12.0
## [65] locfit_1.5-9.10 cowplot_1.1.3
## [67] colorspace_2.1-1 GenomeInfoDbData_1.2.12
## [69] beeswarm_0.4.0 BiocSingular_1.20.0
## [71] vipor_0.4.7 cli_3.6.3
## [73] rsvd_1.0.5 fansi_1.0.6
## [75] S4Arrays_1.4.1 gtable_0.3.5
## [77] sass_0.4.9 digest_0.6.37
## [79] SparseArray_1.4.8 dqrng_0.4.1
## [81] farver_2.1.2 memoise_2.0.1
## [83] htmltools_0.5.8.1 lifecycle_1.0.4
## [85] httr_1.4.7 statmod_1.5.0
## [87] bit64_4.5.2